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Abstract 

The appearance of large geolocated communication datasets has recently increased our under¬ 
standing of how social networks relate to their physical space. However, many recurrently reported 
properties, such as the spatial clustering of network communities, have not yet been systematically 
tested at different scales. In this work we analyze the social network structure of over 25 million 
phone users from three countries at three different scales; country, provinces and cities. We con¬ 
sistently find that this last urban scenario presents significant differences to common knowledge 
about social networks. First, the emergence of a giant component in the network seems to be con¬ 
trolled by whether or not the network spans over the entire urban border, almost independently 
of the population or geographic extension of the city. Second, urban communities are much less 
geographically clustered than expected. These two findings shed new light on the widely-studied 
searchability in self-organized networks. By exhaustive simulation of decentralized search strate¬ 
gies we conclude that urban networks are searchable not through geographical proximity as their 
country-wide counterparts, but through an homophily-driven community structure. 
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Introduction 


In the last decade social network analysis methods have allowed us to uncover local and 
global patterns [T], locate influential individuals |2], and examine network dynamics |3]. The 
study of macro-level social networks traces the outcomes of collective and large-scale social 
interactions such as economic development [1] , resource transfer [5] , disease transmission [6] , 
and communications [7] over a large population. In these cases, networks nodes represent 
individuals, and links are generally defined by friendships or acquaintances among them. 
Well documented structural patterns of these networks are: the positive correlations in the 
degree of adjacent nodes [S], a short diameter (increasing as the natural logarithm of the 
number of nodes) [9] , and network transitivity or clustering, which is the propensity for nodes 
pairs to be connected if they share a mutual neighbor [9]. Interestingly, social networks are 
also divided in groups or communities, and the existence of such communities alone can 
produce both degree correlations and high clustering |T0]. On the other hand, some social 
links are also the consequence of similar attributes of their nodes. Similar people tend to 
select each other they communicate more frequently and present stronger social 

interactions [7j. 

Parallel to the rise of social network analysis, and often using similar data sources, human 
mobility patterns have also considerably evolved in the recent years nani]. An interesting 
topic of study which has started to grow recently is to combine findings from both areas to 
explain the relationship between social networks and geographical space. Evidently, social 
contacts can exist only if there is the opportunity for such contacts to be created. This 
explains, for example, the ubiquitous findings showing that geographic proximity favors the 
existence of social contacts dalle]. Additionally, Network Communities (locally dense areas 
of the social graph) have been analyzed in several country scale social networks when the 
spatial positions of the nodes are known, as more and more often is the case in social networks 
resulting from information and communication technologies ml. A well-documented result 
of these communities is that they retrace national [T8l |I9| and administrative borders [23| 
when studied at a country scale. 

The spatial dispersion of social contacts at country scale has been studied in the context of 
transportation planning (see |20| and references therein). Kowald et ah made a comparative 
study of social ties and their distances, from surveyed of individuals within cities in three 
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different continents. They reported that although the ultimate models need to incorporate 
the characteristics of egos, ties, and transportation facilities, there is a general trend of a 
power law decay of social ties with distance. In this work we want to explore the group 
structure social networks in cities and its relation to space. The size of social groups have 
important implications our societies, Simmel EZH viewed the increasing size in networks 
groups as the origin of the isolation of individuals. These implications and the related 
literature are out of the scope of this work. Social networks studies within cities have 
measured the role of the density of social ties |2SII2S] or face to face encounters [221 [21] • Here, 
we are interested in the analysis of communities within cities and their relation to space. 
Despite some analysis of communities within cities [271I2H], there is still lack of knowledge on 
a clear structure of urban social networks in space. Specihcally, how connected components 
emerge with distance m in urban social networks. 

Here we assign each mobile phone user to a hxed location corresponding to his/her most 
commonly used zipcode or mobile phone tower with the goal of systematically studying the 
spatial properties of their social networks at different scales, including the formation of a 
giant component in space. The geographic distance between two nodes is then dehned as the 
distance of their respective most common locations, typically home or work. It is expected 
that within cities this distance should not be a strong limiting factor in the creation of social 
ties as it may be other factors that dehne their social distance. Social distance is given by 
differences between groups of society, including differences such as socio-demographic, race 
or social identity [3D]- Searchability is a well-established network property that relates to 
both geographic proximity and social distance: ordinary people are capable of directing 
messages only through their acquaintances and to reach any target person in only a few 
steps. Milgram [311132] hrst discovered this property, in a social experiment that routed 
letters across U.S. In the light of email communication, Dodds et ah [33] showed that when 
routing a message to a target, people selected in the hrst steps acquaintances that could 
be geographically close to that target. However, in the latests steps, participants selected 
acquaintances that could belong to the professional group of the target (i.e. socially close). 
Up to now, the network structure that makes searchability possible has not been empirically 
measured in large-scale social networks. 

We designed our study to explore the role of both social and geographic distances in 
social networks. Social distance is not trivially dehned in social networks with data passively 
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collected without much information about the attributes of the nodes. Introducing a metric 
of social distance for these cases is an interesting question, but out of the scope of this 
work. Watts et ah dehned the social distance between two nodes as the difference in 
hierarchy levels of the two smallest groups the nodes belonged to [M]. Here, we use a 
similar dehnition, proposed by Kleinberg et ah [35]: social distance S{u,v) between nodes 
u and V is the number of nodes in the smallest group containing u and v. In this work we 
dehne social groups as network communities, which are locally dense sub-networks. Networks 
communities are thus a central aspect to the analysis of social networks, being the source 
of their structural properties (degree correlation and high clustering) and consequence of 
non-structural properties, such as homophily [36]. The detection of network communities 
(modules or groups) is a difficult task that has attracted much attention in the last few years 
m Here we adopt a well-established method that detects communities by optimizing the 
Newman Girvan modularity metric |38j . 

We first present a general description of the measured social networks, with focus on 
the small-world properties and link-distance distributions. Next, we report the performance 
of different routing strategies and show that geogreedy strategies (choosing the smallest 
geographical distance to the target) are ineffective within cities while strategies based on 
social distance (choosing within the smallest community) still work. We discover two features 
of urban social networks that cause the failure of geographic strategies: urban communities 
are geographically dispersed and there is not a large connected component in groups of 
nodes defined by their geographic proximity. We further measure in the urban networks 
how the density of links P(m, v) decays with increasing group size [S) or distance. We find 
that the probability of finding a link between individuals u and n in a group of size S scales 
as P{u,v) ~ 5'“'^, with 7 < 1 when groups (S' = Sr) are defined by users living within 
geographic balls of a certain radius r. This is in contrast with observations at the national 
scale which report 7 >= 1 [Tn|. These results support the evidence that while geogreedy 
algorithms work to reach a target’s city, they fail within urban borders. In addition, we 
show that the condition 7 >= 1 still holds when groups (S' = S'c ) are defined by social 
distance. These results of urban groups defined by either social distance or geographic 
distance are in nice agreement with the analytic conditions of networks searchability [39| 
and support the results reported in routing experiments |33|. This work provides novel 
evidence of social networks: urban networks form geographically dispersed communities 
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that make them searchable. 


Results 

Network Structure 

Our data set contains information for 7 billion mobile phone interactions gathered during 
a 6 month period in France, Portugal and Spain. We report the structural network charac¬ 
teristics in table [1} These results conhrm that the networks exhibit the small world property, 
with the average number of people in the shortest path between a sender and a recipient 
(/) is 6.5, 6.4, and 8.4 in the different countries, similar to the values reported in previous 
works [71110]. As a sole illustration of the resulting networks, we extract the spatial distribu¬ 
tion of the most central people in the network, considering someone is more central if he/she 
is in average closer to everyone else in the graph (closeness centrality). In Fig. [^we show 
the distribution of the average graph distance between a sender and all possible recipients 
p{{l)) among the population for each country. This value is also known as the inverse of 
the closeness centrality [H] and it ranges from 3.8 to 11, so everyone in the country is in 
average within 4 hops from the most central people and within 11 of the less central ones. 
Each dot represents a mobile phone tower, which is our smallest spatial resolution. In order 
to expose the backbone of the social network, the color intensity of each mobile phone tower 
represents the closeness centrality of the most central person in that tower. Additionally, 
the links highlight the social connections only among the 50 most central people in each 
country, showing signihcant differences in the social network analyzed in the three countries 

Regarding degree distribution, our three networks present the common heavy-tail dis¬ 
tribution found in previous works with social networks msg. Degree distributions for all 
three networks are shown in Fig. (details about power-law htting can be found in Table 
SI). We note the existence of hubs (nodes with very high number of connections) in all three 
networks. In order to measure geographic proximity between individuals we need to assign 
a location to each of them. In our study, users are located in their billing zip code (Spain) 
or their most used tower (France and Portugal). Spain zip codes are geolocated according to 
geonames database, available at http://downloads.geonames.org/export/zip, and grouped 
according to latitude and longitude since some zip codes have identical coordinates. Towers 
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coordinates were provided by the carrier. In total 8, 928 different locations are available in 
Spain, 17,475 in France and 2,209 in Portugal. It is well documented that the probability 
of finding a social tie decreases with geographic proximity, regardless the proxy used to infer 
the social network: blogs na, location based social networks [43l 144] or mobile phone data 
[71 UHl 112] . In all of them the fraction of social links between nodes that are within distance 
r from each other decreases (at least in a certain range) as a power law, with exponents 
between —1 and —2. As shown in Fig. [^, our data fits this behavior for all three networks. 
Kowald et ah [20] present a careful analysis of the decaying function observing distance 
bands depending on the population, similar analysis on this data remain to further studies. 

Moreover, due to the high number of links considered we are able to observe long-range 
peaks. The reason for these peaks is the heterogeneity in the spatial distribution of pop¬ 
ulation (we observe the same peaks even if we randomize the links while keeping actors in 
the same location). Once established that the short paths exist all across the network, we 
explore the success of routing strategies at two levels: intercity and intracity. 

Exploring Routing Strategies 

In order to gather insights on the social network structure, we investigate the well-known 
searchability condition. We explore different routing strategies on the social networks de¬ 
scribed above. We separate the routing experiment into two phases: intercity routing and 
intracity routing. 

Intercity routing seeks to reach the correct city while intracity routing searches for the 
individual target within a city. Cities are defined by their administrative borders. In this 
study we consider two scales: provinces and municipalities as shown in Fig. S5. On both 
phases, we test different decentralized routing strategies which employ only information of 
neighbor nodes (also called contacts or friends). In a random search (ran), individuals route 
the message by randomly selecting a neighbor node that did not have had the message pre¬ 
viously. Geographical routing (geo) passes the message to the contact that is geographically 
closest to the final target, whereas degree routing (deg) selects the friend with the highest 
number of friends. Finally, community routing (com) forwards the message to a friend such 
that he/she belongs to the smallest community containing the target (see details in the 
Methods section). 
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Our intercity simulations results presented in Fig. indicate that both geo and com 
routing are able to reach the target cities. Moreover, the success rate depends only loga¬ 
rithmically on the population size of the destination city (Fig. S8), conhrming that both 
strategies are equally efficient. The intercity experiment can be replicated in our home- 
page Geographic strategies had already been reported successful using a half million 
bloggers network across the US ng. However, intracity routing has not been previously 
explored because both the low sample size of the network (0.15% of US population) and 
the lack of information of the coordinates of individuals within cities obliged to relax the 
modeled network structure: namely, nodes were allowed to forward messages to anyone else 
within the target city, even if they were not directly connected. In contrast, our larger pop¬ 
ulation sample (12% - 40%) and much smaller spatial resolution (mobile phone tower scale) 
allow us to explore routing inside cities using strict routing among connected individuals. 

Next, we explore routing strategies by analyzing the network properties within the geo¬ 
graphic administrative borders at two scales: provinces as upper limits (usually containing 
large cities plus suburbs) and municipalities as lower limits (see SI for details). Thus, we 
analyze the three different routing strategies in 155 social networks from the large munici¬ 
palities and all 150 provinces of the three countries. In contrast to intercity routing, routing 
inside municipalities is signihcantly more successful if the strategy uses community infor¬ 
mation (Figs. SIO - S15 show additional strategies). For different routing strategies Fig. 
shows the success rate for municipalities (hlled circles) and provinces (open circles) in each 
country as a function of the population size N; an upper limit of 100 hops was employed 
and Fig. S24 shows results with a smaller upper limit. We hnd that at both scales the 
community based routing is efficient because of the slow decay in success rate R ^ c — blnN 
(c = 2 ± 0.03 and b = 0.133 ± 0.003) and in contrast to the random strategy, which as 
expected decays almost reversely linear as i? ~ (a = 0.95 ± 0.03). Interestingly, the 
geographically based routing presents a crossover behavior between municipalities (only in¬ 
tracity routing) and provinces (including an initial intercity stage). This behavior is due to 
the fact that a province consists of several municipalities. Although the geographically based 
routing reaches the correct municipality, within the municipality this strategy fails. This ex¬ 
plains the different scaling observed for geographic routing in municipalities and provinces: 
while within municipalities the routing success rate scales similarly to the random routing 
R ~ N~°‘ (a = 0.66 ± 0.03), the province routing success rate scales similarly to community 


routing R ^ c — blnN (c = 0.82 ± 0.05 and b = 0.056 ± 0.004), but with a lower success 
rate as a consequence of its inefiicacy within municipalities. 

In the next sections we show that the failure of the geographic routing within cities lies 
in two previously unknown spatial properties of urban social networks: lack of short-range 
connectivity and geographical dispersion of urban communities. 

Connectivity collapse within cities 

A necessary condition for any geogreedy algorithm to succeed in a routing experiment is 
that the subgraph induced by the nodes located within any geographic ball of radius r must 
be connected. This is equivalent to saying that if a message headed to target user B has 
reached a user A, A and B in the same connected component within the subgraph induced 
by those nodes included in the circle whose center is in B and has radius up to A. While this 
is granted in a lattice our results show that is not necessarily the case in a real-world network 
(see Fig. &)■ We test this structure in our data using geometric and social distances. We 
divide the network into groups of size Sx using either geographic balls (while in this work 
we only consider 2D geographic circles we keep the term balls for consistency with previous 
theoretical work [32] which has been generalized to higher dimensions) of a certain radius 
r {X = r) or existing communities {X = c) A natural question emerging then is: which is 
the critical radius Tc so that geographic balls with r > Vc are likely to contain a connected 
network? Interestingly we observe that there is not a unique rc, but rather this radius is 
dehned by the size of a city, so that only geographic balls containing entire cities contain a 
connected network. 

We illustrate this fact further by calculating the size of the largest connected component 
within different radius and group sizes, performing this analysis centered in different loca¬ 
tions from the capital municipality (city) or centered in a province of the three countries. 
Fig. |4 )d shows that the fraction of nodes in the giant component is much smaller within cities 
than within provinces. Surprisingly, we find that this lack of connectivity is not caused by 
not having enough short-distance links (actually between 18% and 40% of links are within 
the same location (tower or zip-code)). When we zoom into a region of the city we find 
small highly clustered groups which form islands; the paths among these geographically 
neighboring groups exist through people living far away. 
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To better illustrate this finding we have studied all intra-tower networks in the capital 
cities and compared them to networks of the same size centered in municipalities in the 
countryside. Fig. shows the average giant component for towers and municipalities of a 
certain size. Municipalities with a given population have a larger giant component than a 
tower in a city with the same population. 

Given a fixed number of nodes, a giant component emerges more likely with a higher 
number of links and with low clustering (a link closing a triangle does not enlarge any 
connected component). As shown in Figs. and [^, both effects are present at the 
municipality level and not within towers. This explains the different giant component sizes 
between municipalities and towers. However, high clustering seems to be dominant for the 
lack of a connected component, since in Portugal the average degree is the same in towers 
and municipalities. Moreover, the small average degree does not seem to be due to lack 
of data, since the data from France presents the highest average degree at a country scale, 
while it exhibits the smallest average degree on the tower scale. 

Our results on geographic distance r agree with previous literature da 112] showing that 
the probability of two users within distance r to be connected follows P(r) ~ G However, 
this sole finding does not give us any information about the number of links between people 
within the same location (tower/zipcocode), since in principle they are within r = 0 distance. 
In order to be able to apply pure geographical models (generating links with P(r) ~ A) 
to our data, we have to randomize the position of the users around the tower’s location. A 
common assumption for mobile phone data is considering that if a call is processed by a 
tower, then that tower is the closest to the user’s location. This assumption implies that 
the geographic space can be divided according to the Voronoi diagram of the towers in that 
region. This way our randomization assigns each user a position uniformly distributed in 
the Voronoi cell it belongs to. Figure S22 shows the randomization process in Paris and 
Lisbon. After randomization, the distance r between any two users is greater than zero, so 
we can apply A models the number of predicted and present intra-tower links for the same 
number of links in the whole network. In Fig. we show that the number of observed 
intra-tower links in both cities is higher than what a pure geographical model 1 /r would 
generate (even higher than a 1/r^ in the case of Lisbon). Despite this abundance of links, 
there is no giant component, what implies that clustering plays a major effect at this level, 
producing highly clustered islands within the same tower. 
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Geographical dispersion of urban communities 


On the country scale the identihed communities are known to be highly spatially corre¬ 
lated and even redraw the administrative borders as shown in Fig. @(1 eft) where the colors 
indicate the dominant community of each mobile phone tower.This has been the motivation 
of a research line oriented to redraw the political maps according to social network features 
[T8l |23l 05]. However, in the city scale (Fig. [fright) the communities are dispersed over 
space and within the downtown area they are nearly randomly distributed. This shows for 
the hrst time that communities within cities are not geographically determined. 

These results are confirmed by the measurement of {ream) (average distance between 
two towers belonging to the same community) and (rrand) (average distance between two 
random towers), which are reported in tableDetails on the calculation of both distances 
can be found in the Methods section. While {rrand) is consistently over 4 times larger 
than {rcom) in the country scale, the two measures become much more similar within cities, 
quantitatively conhrming the visual result on Fig. 

An additional unexpected hnding is that some touristic areas break the general country¬ 
wide trend. A signiheant part of the French Riviera and the south coast of the island of 
Corsica belong to the Paris community, even if they are far away from the capital city. 
Same thing happens with Ibiza (western most Ballearic island) and Madrid. In Portugal’s 
Algarve (south coast of the country) the effect is not so clear, but there is dehnitely a higher 
community diversity in the area, and it is possible to hnd towers belonging to both Porto 
and Lisbon communities. Note that this is unlikely to be a touristic seasonal effect, because 
in France and Portugal the most used tower in a 6 month period is assigned to the user, 
and in Spain the billing zipcode is used. Since both are reasonable proxies for permanent 
residency, this effect is more likely due to urbanites who retired to the coast, and even 
become majority in certain areas, but still keep their social ties back in the large metropolis. 

Distance Metrics and Searchability in Urban Networks 

Network searchablity is related to its links density |3H ESj- The density of links P as a 
function of nodes distance S determines the necessary condition for network searchablity. 
This condition is postulated in the group model framework [35], which generalizes previous 
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results in hierarchies of social networks [3l] and spatial lattices [39]. P{u, v) is the probability 
of link existence between a pair of nodes {u,v) that are within distance S{u,v), dehned as 
the size of the smallest group containing both u and v. 

Given the distance distribution of the form P{u,v) ~ S{u,v)~'^ when 7 < 1 the social 
network is not searchable; if 7 = 1 the social network is always searchable, and if 7 > 1 the 
network can be searchable. 

We test this structure in our data using geometric and social distances. We divide the 
network into groups of size Sx using either geographic balls of a certain radius r (X = r) 
or existing communities (X = c) as illustrated in the insets of Fig Then we calculate the 
probability that two nodes that belong to the same group (being that group the smallest 
they both belong to) share a link and how this probability depends on the group size. We 
observe that both functions have the exponent close to 7 = 1 , but in the groups based on 
geography these exponents are always below 1 , while the exponent is consistently above 1 
for communities as shown in Fig. Although the group-model framework does not capture 
all of our network properties (heterogeneous degree distribution and clustering coefficient) 
we hnd that our empirical results in urban networks con&m theoretical results regarding 
the conditions for searchability of social networks. 

Discussion 

In summary, we have demonstrated that cities (as dehned conventionally by their admin¬ 
istrative borders and population size) change the structure of social networks. Interestingly, 
these hndings could be related to urban growth and the economic function of cities [25l |26] . 

Taken together, the presented results lead to the following discoveries: (i) Communi¬ 
ties within cities follow a hierarchical structure that favors social distance over geographic 
distance, (ii) While people living within geographic radius including several cities form a 
connected network, the same radius within cities leads to highly clustered components only 
connected through people in distant parts of the city. This behavior occurs across different 
cities and regions sizes, highlighting cities as functional entities of the social networks (hi) 
The structure of communities (here related to social proximity) and not geographic distance 
is what makes social networks searchable within cities. This hnding is consistent with ex¬ 
perimental results that suggest people do use the profession or name of the target in the 


12 


final steps to make inferences about his/her education or ethnicity, as a hint to help routing 
within cities [HH] . 

This work uncovers an unknown feature of social networks: while at the national level 
descriptions of social networks consist of highly connected and geographically close commu¬ 
nities, we find that geography plays only a minor role when forming communities within 
cities. Urban networks consist of geographically dispersed communities. This structure ex¬ 
plains why people are able to successfully route in Milgram-like experiments, provided they 
correctly identify the community of the target. Our results support the theoretical hypothe¬ 
sis of Kleinberg: the likelihood to find friendships within communities decays as a power-law 
with increasing community size [35], confirming that among all possible network configura¬ 
tions, humans have favored those such that a message can reach anyone even if delivered 
using only local information. This is a remarkable example of a self-organized structure 
that allows a small group of individuals to solve a complex problem by cooperating to take 
advantage of collective knowledge inisz]. 

Methods 

Data 

We analyze phone records for a six months period in three countries: France, Portugal, 
and Spain. In total 7 billion phone interactions are considered. In order to build social 
networks from this data, only links with at least one communication per direction are in¬ 
cluded. This is a common technique in the literature [3 US, HH] to avoid both marketing 
callers and misled numbers. The resulting social networks have 18.7, 1.2, and 5.9 million 
users, for France, Portugal, and Spain respectively. Further details are provided in the SI. 

Routing Algorithms 

In order to deliver the message, several strategies can be used. In the following we describe 
every criteria used in our experiments. 

RAN We use random routing as a baseline comparison, by employing depth first search 
(DFS) into a routing algorithm, we effectively avoid the message to get into infinite 
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loops. The application of DFS in the Milgram experiment is quite straightforward: 
when a participant receives a message, he/she knows the list of people who already 
got the message. The participant will never forward to none of these people, unless all 
of his/her friends are in the list. In this case, he/she will send the message back to the 
person who hrst sent the message to him. In a tree network, this would be the case 
of a branch which has been explored without success and the search process continues 
going backwards. Since our social network is far from being a tree, the number of 
rolling back events is low (less than 10“® in all of our simulations). 

GEO This procedure consists of sending the message to the friend geographically closest to 
the target. In the intercity scenario, locations are considered on the municipality level. 
In the intracity scenario, tower locations are employed. Note that this discretization 
produces a number of ties (two or more friends are at the same distance from the 
target). 

DEG In this case, the message is forwarded to the friend with the largest number of friends 
among the candidates. 

COM In order to mimic social attributes (school, work) communities are detected in the 
network. To detect communities in social networks, we use the well-established Lou¬ 
vain method HZl EZl SSHSO]. This method is a greedy optimization method that 
attempts to optimize the network modularity by aggregating nodes belonging to the 
same community and building a new network whose nodes are the communities. This 
method assigns to each person a set of communities at different hierarchical levels. 
Although the number of aggregation levels L depends on the network and it is auto¬ 
matically obtained from the algorithm, in all of our networks the algorithm provided 
between 3 and 7 aggregation levels. Note that this algorithm provides hierarchical 
communities. If two nodes i and j have a community of level I in common they will 
share as well all the communities in higher levels, formally: 

i,j e [1, ..N]/Cii = Cji -)■ Cix = Cj^Wx e[l + l ,.., L] (1) 

where N is the number of people. A person will send the message to a friend with the 
lowest possible community level in common with the target. While it is arguable that 
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community detection requires global information and such might not be available to 
participants in a Milgram-like experiment, recent research [SI] has reported that people 
are able to relate communities detected in their network to certain social attributes and 
affiliations, thus making communities a reasonable proxy for those unknown attributes 
in our data set. 

In our experiments, these criteria are combined, by using several of them to solve ties: 
this way, we will denote ran-deg to a routing scheme where hrst the already visited nodes are 
discarded from candidates (ran), and then those with the highest degree are chosen {deg). If 
there is still more than one possible friend after the routing logic is completed, the message 
is forwarded to one of these candidates at random. In our ran-deg example, this happens if 
two or more friends were not previously visited and have the same degree. 

Geographical dispersion of communities 

We found a fundamental difference between the behavior on urban scale and on the 
country one: geographical clustering turns out to be more intense in the intercity scenario 
than in the intracity one. To reach this conclusion we have calculated the spatial clustering 
of the communities by the following steps: 

• Perform a community detection on the network 

• Associate the tower to the most common community among that tower’s users. 

• Calculate the average distance (rcom) between any two towers belonging to the same 
community 

where C denotes the number of communities found, W the number of towers in com¬ 
munity c and r(a, b) the distance between towers a and b. 

• Assign communities with the same sizes randomly to the towers and calculate the 
average distance ([^ of the randomized data {rrand}- 


ij' cam ) 
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Country % GC Nodes N Links E {k) (c) (c^) {1) (fr) 

France 99.23 18.7 ■ 10® 81.3 ■ 10® 8.73 0.16 9 ■ 10"^ 8.52 7.75 

Portugal 96.23 1.21 ■ 10® 4.00 ■ 10® 6.57 0.26 5 ■ 10“^ 8.35 7.44 

Spain 95.81 5.92 ■ 10® 16.1 ■ 10® 5.44 0.21 48 ■ 10"^ 10.36 9.20 

TABLE I: Characteristic properties of the social networks in the studied countries: Size of the 
giant component (GC), number of users (Nodes) and relationships (Links), average degree (A:), 
average clustering coefficient (c), average shortest path length (/), and the corresponding values 
for random networks with the same size {cr) and (L)- 


Network 

{rcom) (km) 

{rrand) (km) 

ran) / (j" com) 

Portugal 

64.4 

240.1 

3.72 

France 

115.7 

410.71 

3.54 

Spain 

118.5 

521.2 

4.39 

Lisbon {concelho) 

3.4 

4.31 

1.26 

Paris {department) 

4.1 

5.7 

1.39 

Madrid {municipio) 

3.2 

3.46 

1.08 


TABLE II: Average distance between two towers belonging to the same community {{rcom)) com¬ 
pared to the distance when the communities are randomized {{rrand))- The geographical effect 
{jrand) jg nioi’e pronounccd in the nation-wide communities. 

V com ) 
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FIG. 1: Visualization of central places in France, Spain and Portugal. Each circle represents a 
mobile phone tower and its color (the brighter the more central) corresponds to the inverse of 
closeness centrality {1) (average number of hops to any other person) of the most central people in 
this tower. People are always assigned either to their billing address or most used tower. White 
lines highlight the social network between the 50 most central persons of each country. In the three 
insets the distribution of the (/) of all persons and the relation to the used color are also shown. 
This figure was created using Grace and Inkscape. 
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Madrid-Barcelona 



FIG. 2: Country-wide social networks structure, (a) Degree distribution for each of the country 
level networks, (b) Probability of a link to have distance r in each of the networks. Distances are 
grouped in 7 km bins. In all three countries, distribution present a power law decay (exponents 
between —1 and —1.5) up to 100 km. A large fraction of links lie within the same tower (r = 0), 
averaging 40% in Spain (red), 18% in France (blue) and 21% in Portugal (green). 
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a) 



FIG. 3: Results for different routing strategies in both stages, a) Dependence of the number of 
hops 1 on the success rate R for intercity routing (results for completing the delivery within 15 and 
100 hops are highlighted by circles), b) Success rate versus population size for three strategies in 
155 municipalities and 150 provinces. All logarithmic and power-law functions are guides to the 
eye. 
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b) 



Group Size / Population 


FIG. 4: Short range connectivity a) In a 2D lattice (left), any geographic ball contains a connected 
network, however this is not the case for any network (right) where the path between two nodes 
within a geographic ball might include nodes out of the ball if the network induced by the nodes 
within the ball is not connected, b) Fraction of nodes in the giant component as a function of the 
relative size of the geographic ball for the three capitals compared to the country-wide networks. 
Each of the 6000 dots in the figure was calculated by selecting 2 nodes u and v at random within 
a city or within the country, extracting the subnetwork defined by the ball whose center is in u 
and radius up to v, and identifying the number individuals that belonged to the giant component 
of such subnetwork. 
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FIG. 5: Connectivity collapse within cities, a) Relation between population size and fraction of 
nodes in the giant component for all towers in the capital cities (blue) and municipalities in the 
country within the same range of population (red). Errors bars represent the standard error of 
the mean The size of the connected components within municipalities tends to be higher 
than within towers of the same size, b) and c) depict the causes of this behavior, smaller average 
degree and higher clustering are the reasons why the giant component is larger in municipalities, d) 
Number of links within the same tower using several randomization models. Results are averaged 
over 10 runs. The real network has a bigger number of intra-tower links than a space independent 
graph (ER) and a ^ model. In the case of Lisbon, the real network has even more links than a ^ 
model. To explain the high number of intra-tower links the geographical distance is not sufficient, 
thus another effect like clustering is needed. 
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FIG. 6: Geographical clustering of social communities. On the country scale, towers belonging 
to the 20 biggest communities are presented in different colors and shapes. On the city scale, 
towers within each capital city are presented. On the country scale most of communities fit with 
the administrative boundaries while within cities communities do not seem to be geographically 
driven. The figure was created using R packages maptools and ggplot2. 
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FIG. 7: Comparison of the exponent 7 for the probability of finding a link between two people 
as a function of smallest common group size: p{Sx) ~ Sx'^ for 96 cities in France. Groups are 
constructed either based on geography {Sr, black) or on community {Sc, red). 
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